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DETAILED ACTION 
Response to Arguments 

1 . Applicant's arguments filed 12/04/2007 have been fully considered but they are 
not persuasive. 

In response to arguments (pages 11-13): 

Argument 1 (page 1 1 paragraph 4 - page 12 paragraph 1): 

• "Contrary to the Examiner's assertion, Lew does not disclose or suggest 

"constructing a transition window from an estimated bit time having a 

preamble sub-window, and at least one data sub-window...". In rejecting 
this aspect of the claimed principles, the Examiner cites Lew at Col. 1 , 
lines 38-51, and Col 1, lines 51-62. These cited portions of Lew describe 
the structure of the AES signal format standard used for digitally encoding 
audio signals and transmitting the same. The present principles also 
works with such signals, and it is believed therein lies the confusion. 

Argument 2 (page 12 paragraph 4 - page 13 paragraph 1): 

• "In addition, the concept of using the "transition locations" in the serialized 
stream to extract data therefrom relative to the constructed transition 
window having a pre-amble sub window is also neither disclosed, nor 
suggested by the teachings of Lew. Figure 2 of Lew illustrates and 
AES/EBU form for digital audio streams. As discussed above, the present 
principles are operating on AES streams, so the disclosure of the format of 
such stream by Lew does not in any way anticipate the "transition 
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locations" as claimed by applicant. In fact, Lew teaches away from the 
present principles by reciting that the PLLs utilize the bit transitions to 
establish the clock signals it will use for synchronization. (See Col 5, lines 
12-15). The remaining disclosure of Lew does not make any further 
mention of the bit transitions in the AES stream and/or the use of the 
same in extracting data from the serialized stream of digital audio that is 
already in the AES format. This is because Lew does not extract anything 
from the AES stream." 
Response to both arguments 1 and 2: 

Lew teaches a data stream with a well known AES/EBU format as 
illustrated in Fig. 2. Having compared figure 2 of Lew with the current application 
figures 6 and 7, Examiner takes the position that a window and a frame have the 
same function and are equally effective. Particularly, figure 2 of Lew shows start 
and stop bits within frames and sub frames of an AES format of a frame for an 
audio signal. The digital audio serial data is also bi-phase modulated for 
synchronization and clock extraction purposes. Within the frame of a digital 
stream, there exists various bits such as validity, user, and parity. The location of 
these bits renders how data can be extracted from a signal and provides 
synchronization. Lew describes bit fields in AES format where preamble fields 
34 and 36 provide synchronization and the identification of preambles for digital 
audio fields. This indicates the start of a block of frames, where user information 
is conveyed. The location of a sub frame plays an important role in establishing 
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synchronization and where to extract information when transitioning from one sub 
frame/frame to another (col 4 lines 34-68). Additionally, Examiner takes the 
position that a transition location is a location where the status of bits changes 
within a frame. Lew teaches sub frame synchronization of preambles and bit 
transitions, (col 6 lines 16-32). Examiner also takes the position that the 
extraction of information is taught by Lew, where not only clock information but 
additional information such as user information and other parameters relative to 
detection and transmission. 

Claim Rejections - 35 USC § 102 

2. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

3. Claims 1, 11-14, and 20-21 are rejected under 35 U.S.C. 102(b) as being 
anticipated by Lew, US 5245667 A (hereinafter Lew). 

Re claims 1 ,and 1 1 , Lew teaches a method of extracting digital audio data words 
from a serialized stream of digital audio data (col 4 lines 34-68 & Fig. 2), comprising: 

constructing a transition window from an estimated bit time for said serialized 
stream of digital audio data, said transition window having a preamble sub-window (col 
6 lines 16-32) and at least one data sub-window (col 4 lines 34-68 & Fig. 2); 
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extracting plural digital audio data words from said serialized stream of digital 
audio (col. 1 line 22-28) based upon the location of each transition in said serialized 
stream of digital audio data relative to said preamble sub-window (col 6 lines 16-32) and 
said at least one data sub-window of said transition window (col 4 lines 34-68 & Fig. 2); 

each one of said extracted plural digital audio data words having a preamble (col 
6 lines 16-32) identifiable by a combination of at least one transition located in said 
preamble sub-window of said transition window and at least one transition located in 
said at least one data sub-window of said transition window (col 4 lines 34-68 & Fig. 2). 

Re claims 12 and 13, Lew teaches the method of claim 1 1 , wherein said fast 
sample rate is at least about twenty times faster than a data rate for said serialized 
stream of digital audio data (Lew col 5 line 44-53). 

Re claims 14 and 21 , Lew teaches the method of claim 1 3, wherein each one of 
said extracted plural digital audio data words has a preamble (col 6 lines 16-32) 
identifiable by a combination of at least one transition located in said preamble sub- 
window (col 6 lines 16-32) of said transition window-and at least one transition located 
in said at least one data sub-window of said transition window (col 4 lines 34-68 & Fig. 
2). 

Re claim 20, Lew teaches a 20. (Previously presented) A bi-phase decoder for 
use in decoding a stream of AES-3 digital aud[o data, comprising: 
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a decoder circuit coupled to receive a stream of AES-3 digital audio data, an 
estimated bit time for said stream of AE$-3 digital audio data (col 4 lines 34-68 & Fig. 2) 
and a fast clock, said fast clock having a frequency of about at least twenty times faster 
than a frequency of said stream of AES-3 digital audio data (Lew col 5 line 44-53); 

a data store (col 8 line 24-42) coupled to said decoder circuit, said data store 
receiving sub frames of digital audio data extracted, from said stream of AES-3 digital 
(col 4 lines 34-68 & Fig. 2) audio data by said decoder circuit (Fig. 1 & col 3 line 55-65); 

said decoder circuit extracting sub frames of said digital audio data by 
constructing a transition window from said estimated bit time (col 4 lines 34-68 & Fig. 2), 
sampling said stream of AES-3 digital audio data using said fast clock (Lew col 5 line 
44-53) and applying said sampled stream of AES-3 digital audio data to said transition 
window to identify transitions (col 6 lines 16-32), in said sampled stream of AES-3 digital 
audio data, indicative of preambles of said sub frames of digital audio data (col 4 lines 
34-68 & Fig. 2). 

Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 1 02 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

The factual inquiries set forth in Graham v. John Deere Co.. 383 U.S. 1. 148 
USPQ 459 (1966) , that are applied for establishing a background for determining 



Application/Control Number: Page 7 

10/519,000 

Art Unit: 2626 

obviousness under 35 U.S.C. 103(a) are summarized as follows: (See MPEP Ch. 
2141) 

a. Determining the scope and contents of the prior art; 

b. Ascertaining the differences between the prior art and the claims in issue; 

c. Resolving the level of ordinary skill in the pertinent art; and 

d. Evaluating evidence of secondary considerations for indicating 
obviousness or nonobviousness. 

5. Claims 2-4, 8, 15-18, and 22 are rejected under 35 U.S.C. 103(a) as being 

unpatentable over Lew US 5245667 A (hereinafter Lew) in view of Gillick et al US 

4837831 A (hereinafter Gillick). 

Re claims 2 and 15, Lew teaches a pair of successive transitions (col 8 lines 24- 
42) located in said preamble sub-window followed by a pair of successive transitions 
located in said at least one data sub-window (col 4 lines 34-68 & Fig. 2). 

However, Lew fails to teach the method of claim 1 , and further comprising 
identifying said extracted data words (Gillick col 8 lines 5-15) as having a first type of 
preamble if said extracted data words have a pair of successive transitions (Gillick col 8 
lines 16-37). 

Gillick teaches that the acquisition of multiple utterances of each vocabulary 
word, method 100 advances to step 106. This step performs a plurality of substeps 
108, 1 10, and 1 12 for each word in the vocabulary. The first of these substeps, step 
108, itself comprises two substeps, 1 14 and 116, which are performed for each 
utterance of each word. Step 114 finds an anchor for each utterance, that is, the first 
location in the utterance at which it has attained a certain average threshold amplitude. 
Step 1 16 calculates five smoothed frames for each utterance, positioned relative to its 
anchor. Additionally, Gillick teaches in reference to figures 4 and 5, where FIG. 4 



Application/Control Number: Page 8 

10/519,000 

Art Unit: 2626 

schematically represents how such smoothed frames are calculated. A smoothed 
frame 1 18 is calculated from five individual frames 104A-104E, of the type described 
above with regard to FIG. 3. According to this process, each pair of successive 
individual frames 104 are averaged, to form one second level frame 120. Thus the 
individual frames 104A and 104B are averaged to form the second level frame 120A, 
and the individual frames 104B and 104C are averaged to form the second level frame 
120B, and so on, as is shown in FIG. 4. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention extracted pairs of successive sub windows containing words. 
Identifying and extracting successive frames with words in them would allow for the 
detection and location of multiple utterances within an audio signal. The repetition of an 
utterance is not limited to adjacent words and can have a separation between the 
repeated words, where transitions from a repeated word to the next can be smoothed 
and the location of occurrence can be stored in memory as to detect a repeating 
segment of multiple words (i.e. chorus). 

Re claims 3 and 16, Lew teaches preamble sub-window-separated by a pair of 
successive transitions located in said at least one data sub-window. 

However, Lew fails to teach the method of claim 2, and further comprising 
identifying said extracted data words as having a second type of preamble if said 
extracted data words (Gillick col 8 lines 5-15) have a pair of non-successive transitions 
(Gillick col 8 lines 16-37). 
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Gillick teaches that the acquisition of multiple utterances of each vocabulary 
word, method 100 advances to step 106. This step performs a plurality of substeps 
108, 110, and 1 12 for each word in the vocabulary. The first of these substeps, step 
108, itself comprises two substeps, 1 14 and 116, which are performed for each 
utterance of each word. Step 114 finds an anchor for each utterance, that is, the first 
location in the utterance at which it has attained a certain average threshold amplitude. 
Step 116 calculates five smoothed frames for each utterance, positioned relative to its 
anchor. Additionally, Gillick teaches in reference to figures 4 and 5, where FIG. 4 
schematically represents how such smoothed frames are calculated. A smoothed 
frame 1 18 is calculated from five individual frames 104A-104E, of the type described 
above with regard to FIG. 3. According to this process, each pair of successive 
individual frames 104 are averaged, to form one second level frame 120. Thus the 
individual frames 104A and 104B are averaged to form the second level frame 120A, 
and the individual frames 104B and 104C are averaged to form the second level frame 
120B, and so on, as is shown in FIG. 4. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention extracted pairs of successive sub windows containing words. 
Identifying and extracting successive frames with words in them would allow for the 
detection and location of multiple utterances within an audio signal. The repetition of an 
utterance is not limited to adjacent words and can have a separation between the 
repeated words, where transitions from a repeated word to the next can be smoothed 
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and the location of occurrence can be stored in memory as to detect a repeating 
segment of multiple words (i.e. chorus). 

Re claims 4 and 17, Lew teaches the method of claim 3, and further comprising 
identifying said extracted data words as having a third type of preamble (col 6 lines 16- 
32) if said extracted data words have a transition located in said preamble sub-window 
followed by first, second and third transitions located in said at least one data sub- 
window (col 4 lines 34-68 & Fig. 2). 

Re claims 8 and 18, Lew teaches the method of claim 1 , wherein said estimated 
bit time is derived from said serialized stream of digital audio data (col 4 lines 34-68 & 

Fig. 2). 

Re claim 22, Lew teaches the apparatus of claim 21 , and further comprising a bit 
time estimator circuit having an input coupled to receive said stream of AES-3 digital 
(col 4 lines 34-68 & Fig. 2) audio data and an output coupled to said decoder circuit (col 
4 lines 34-68 & Fig. 3), said bit time estimator determining said estimated bit time for 
output to said decoder circuit (col 4 lines 34-68 & Fig. 3). 

6. Claims 5-7 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Lew US 5245667 A (hereinafter Lew) in view of Gillick et al US 4837831 A 
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(hereinafter Gillick) and further in view of Akagiri US 5490130 (hereinafter 
Akagiri). 

Re claims 5-7, Lew teaches the method of claim 4, wherein said transition 
window is construed such that said at least one data sub-window includes a first data 
sub-window (col 4 lines 34-68 & Fig. 2). 

However, Lew in view of Gillick fails to teach a sub-window which extends from 
about Va times said estimated bit time to about 3 A times said estimated bit time and a 
second data sub window which extends from about Z A times said estimated bit time to 
about 1% times said estimated bit time (Akagiri col 15 lines 7-20). 

NOTE: The use of about is construed to be an estimate with no fixed range or 
deviation limitation, where 1 .5 or even 2 can be considered close to .5 without a specific 
variation constraint and is therefore construed to functionally equivalent to a scaling of 
.25 or .5. 

Akagiri teaches that frequency range signals is then divided in time into blocks to 
which block floating processing and orthogonal transform processing is applied. The 
block length decision circuit 45 adaptively determines the block length of the blocks in 
each of the frequency ranges according to dynamic characteristics of the digital input 
signal. The digital input signal is notionally divided in time into frames. Then, after the 
digital input signal is divided into plural frequency range signals, each frequency range 
signal is divided into the blocks in which the frequency range signal will be orthogonally 
transformed. Each block corresponds to a frame or an integral fraction (e.g., 1/2, 1/4) of 
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a frame. Thus, the maximum block length in which each frequency range signal is 
orthogonally transformed is equal to the frame length. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention using a window or frame that is scaled by about .25 to 1 .25. Using 
a scaled value for a block length of a frame allows for orthogonal constraints to be met 
during the transformation of a signal from the time to frequency range as to not overlap 
data between adjacent frames by extending/shortening a frame. 



7. Claims 9-10 and 19 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Lew US 5245667 A (hereinafter Lew) in view of Gillick et al US 
4837831 A (herein after Gillick) and further in view of Tackin US 7180892 
(hereinafter Tackin). 

Re claims 9-10 and 19, Lew teaches the method of claim 18, and further 
comprising: 

identifying transitions in said serialized stream of digital audio data which occur 
within said constructed bit window (col 4 lines 34-68 & Fig. 2), 

However, Lew fails to teach the time separating a set of successive identified 
transitions being a measurement of said estimated bit time (Gillick col 8 lines 5-37). 

Gillick teaches that the acquisition of multiple utterances of each vocabulary 
word, method 100 advances to step 106. This step performs a plurality of substeps 
108, 110, and 1 12 for each word in the vocabulary. The first of these substeps, step 
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108, itself comprises two substeps, 1 14 and 1 16, which are performed for each 
utterance of each word. Step 1 14 finds an anchor for each utterance, that is, the first 
location in the utterance at which it has attained a certain average threshold amplitude. 
Step 116 calculates five smoothed frames for each utterance, positioned relative to its 
anchor. Additionally, Gillick teaches in reference to figures 4 and 5, where FIG. 4 
schematically represents how such smoothed frames are calculated. A smoothed 
frame 1 18 is calculated from five individual frames 104A-104E, of the type described 
above with regard to FIG. 3. According to this process, each pair of successive 
individual frames 104 are averaged, to form one second level frame 120. Thus the 
individual frames 104A and 104B are averaged to form the second level frame 120A, 
and the individual frames 104B and 104C are averaged to form the second level frame 
120B, and so on, as is shown in FIG. 4. 

However, Lew in view of Gillick fails to teach estimating minimum and maximum 
bit window times; constructing a bit window from said minimum and maximum bit 
window (Tackin col 36 lines 15-31) 

determining said estimated bit time from a running average of plural 
measurements of said estimated bit time (Tackin col 26 line 53 - col 27 line 9). 

Tackin teaches voice synchronizer should operate with or without sequence 
numbers, time stamps, and SID packets. The voice synchronizer should also operate 
with voice packets arriving out of order and lost voice packets. In addition, the voice 
synchronizer preferably provides a variety of configuration parameters which can be 



Application/Control Number: Page 14 

10/519,000 

Art Unit: 2626 

specified by the host for optimum performance, including minimum and maximum target 
holding time. With these two parameters, it is possible to use a fully adaptive jitter 
buffer by setting the minimum target holding time to zero msec and the maximum target 
holding time to 500 msec (or the limit imposed due to memory constraints). Although 
the preferred voice synchronizer is fully adaptive and able to adapt to varying network 
conditions, those skilled in the art will appreciate that the voice synchronizer can also be 
maintained at a fixed holding time by setting the minimum and maximum holding times 
to be equal. These estimates are periodically quantized and transmitted in a SID packet 
by the comfort noise estimator (usually at the end of a talk spurt and periodically during 
the ensuing silent segment, or when the background noise parameters change 
appreciably). The comfort noise estimator 81 should update the long running averages, 
when necessary, decide when to transmit a SID packet, and quantize and pass the 
quantized parameters to the packetization engine 78. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention extracted pairs of successive sub windows containing words. 
Identifying and extracting successive frames with words in them would allow for the 
detection and location of multiple utterances within an audio signal. The repetition of an 
utterance is not limited to adjacent words and can have a separation between the 
repeated words, where transitions from a repeated word to the next can be smoothed 
and the location of occurrence can be stored in memory as to detect a repeating 
segment of multiple words (i.e. chorus). 
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It would also have been obvious to one of ordinary skill in the art at the time of 
the invention extracted pairs of successive windows using a running average and 
defining a minimum and maximum bit window. Using a maximum and minimum time 
would allow for a buffer having reduced amount of jitter when extracting and quantizing 
information from a signal. Additionally, it is well known to use a running/moving average 
to time based data, where data can be smoothed, reducing the number of fluctuations 
based on a maximum and minimum period. 

Conclusion 

8. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 
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Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Michael C. Colucci whose telephone number is (571)- 



If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571)-272-7332. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



270-1847. The examiner can normally be reached on 7:30 am - 5:00 pm, Monday- 



Friday. 



Michael Colucci Jr. 
Patent Examiner 
AU 2626 




Application/Control Number: 

10/519,000 

Art Unit: 2626 

(571)-270-1847 
Michael.Colucci(a)uspto.qov 



